These are the basic Mixed Models we used as a frame work to analyze all the MLH1 count data.
\[mouse \ av.\ metric ~=~ subsp * sex + rand(strain) + \varepsilon \]
The mixed model lets us separate effects to see how they effect a mouse mean CO count. The subspecies effect is a proxy for Divergence and the random strain effect is a proxy for Polymorphism.
Setting up the data set for using the mixed model (base data set, HQ data set)
only Mus musculus strains with sex matched observations
Male gwRR have been adjusted for the PAR
Q5 excluded
All the effects are unorder factors
male ages within 5 - 20 weeks
Molossinus is coded as a subsp level
This is the break down of mice used within the Mixed model
| WSB | G | LEW | PWD | MSM | MOLF | SKIVE | KAZ | |
|---|---|---|---|---|---|---|---|---|
| female | 14 | 12 | 9 | 15 | 14 | 1 | 1 | 9 |
| male | 10 | 11 | 7 | 8 | 4 | 6 | 5 | 11 |
subspecies levels: 3, Musc, Dom and Molossinus
strain levels: Dom: 3, Musc: 3 Mol: 2
sex: 2 female male
For now I will focus on 3 different models for comparing the subsp, sex and strain effects across relevant dependent variables.
Allows random effect from strain. Considers the wild derived inbred strains as random samples from each subspecies territory.
Post hoc model for investigating strain as a fixed effect.
Used after it’s apparent strain has a significant effect - another post-hoc examination for strain as a fixed effect.
\[mouse .mean CO ~=~ subsp * sex + rand(1|strain) + \varepsilon\]
##
## simulated finite sample distribution of RLRT.
##
## (p-value based on 10000 simulated values)
##
## data:
## RLRT = 26.933, p-value < 2.2e-16
Above is the mixed model M1,
\[mouse \ av \ metric ~=~ subsp * sex * strain + \varepsilon\]
Above is the table of coefficients for MM.3, HQ data set, lm function, sex, strain and subsp are interacting fixed effects.
Strain G is significant (p=0.0005),
interaction terms of sex and strain for (G, MSM, PWD).
How are these models different from the last one – MM4 is missing subsp, so only sex * strain effects.
\[mouse \ av \ metric ~=~ sex * strain + \varepsilon\]
Histograms might not be the best way to represent, but they show the general pattern than female mice have higher within mouse variation for CO counts.
A general pattern is that the female data has higher within mouse variance (both var and cV). Some strains more than others (investigate potential outliers). (LEW females with top 3 variance are all from different dissection dates) This pattern also holds for within mouse cv for MLH1 count (not shown).
todo
Test if dissection date is more significant predictor than strain (for females) (Effect doc) Also test how the number of 0CO bivalents per cell affects mouse with variance. (lm)
Modeling Within Mouse Variance
(use the same models are mean_co, )
XXX (old 1.) HetC.MixedModel.HQ <- lme(mouse.metric ~ subsp*sex, data=DF.HetC.MixedModel.HQ, random=list(strain=pdDiag(~sex) ) )
1.(new 1) Reduced.strain.HQ <- lmer(mouse.metric ~ subsp * sex+ (1|strain), data=DF.HetC.MixedModel.HQ)
lm(mouse.metric ~ subsp * sex * strain, data=DF.HetC.MixedModel.HQ)
lm(mouse.metric ~ sex * strain, data=DF.HetC.MixedModel.HQ)
\[mouse metric ~=~ subsp * sex + rand(strain)+ \varepsilon\]
## Linear mixed model fit by REML ['lmerMod']
## Formula: var ~ subsp * sex + (1 | strain)
## Data: DF.HetC.MixedModel.HQ
##
## REML criterion at convergence: 843.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.1927 -0.5396 -0.0875 0.3379 3.3644
##
## Random effects:
## Groups Name Variance Std.Dev.
## strain (Intercept) 2.528 1.590
## Residual 30.989 5.567
## Number of obs: 137, groups: strain, 8
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 15.1483 1.3184 11.490
## subspMol -0.4944 2.3839 -0.207
## subspMusc -1.9539 2.0133 -0.971
## sexmale -7.9235 1.4128 -5.608
## subspMol:sexmale 0.5890 2.8447 0.207
## subspMusc:sexmale 1.5953 2.1583 0.739
##
## Correlation of Fixed Effects:
## (Intr) sbspMl sbspMs sexmal sbspMl:
## subspMol -0.553
## subspMusc -0.655 0.362
## sexmale -0.476 0.263 0.312
## sbspMl:sxml 0.236 -0.547 -0.155 -0.497
## sbspMsc:sxm 0.312 -0.172 -0.526 -0.655 0.325
##
## simulated finite sample distribution of RLRT.
##
## (p-value based on 10000 simulated values)
##
## data:
## RLRT = 2.4256, p-value = 0.0391
## Linear mixed model fit by REML ['lmerMod']
## Formula: cV ~ subsp * sex + (1 | strain)
## Data: DF.HetC.MixedModel.HQ
##
## REML criterion at convergence: 654.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2750 -0.5172 -0.0026 0.5824 2.4419
##
## Random effects:
## Groups Name Variance Std.Dev.
## strain (Intercept) 0.547 0.7396
## Residual 7.321 2.7058
## Number of obs: 137, groups: strain, 8
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 14.0930 0.6276 22.456
## subspMol -0.2205 1.1369 -0.194
## subspMusc -0.4109 0.9598 -0.428
## sexmale -3.6022 0.6867 -5.246
## subspMol:sexmale -0.4417 1.3782 -0.320
## subspMusc:sexmale -0.5531 1.0484 -0.528
##
## Correlation of Fixed Effects:
## (Intr) sbspMl sbspMs sexmal sbspMl:
## subspMol -0.552
## subspMusc -0.654 0.361
## sexmale -0.486 0.268 0.318
## sbspMl:sxml 0.242 -0.553 -0.158 -0.498
## sbspMsc:sxm 0.318 -0.176 -0.535 -0.655 0.326
##
## simulated finite sample distribution of RLRT.
##
## (p-value based on 10000 simulated values)
##
## data:
## RLRT = 2.1053, p-value = 0.0525
## boundary (singular) fit: see ?isSingular
##
## simulated finite sample distribution of RLRT.
##
## (p-value based on 10000 simulated values)
##
## data:
## RLRT = 0, p-value = 1
##
## simulated finite sample distribution of RLRT.
##
## (p-value based on 10000 simulated values)
##
## data:
## RLRT = 0.0012543, p-value = 0.3892
The fixed sex effect has the largest effect sizes. All are negative (for male). (I’m not sure if I should switch to testing the additive fixed effects by themselves) - I’m pretty sure the outcome would be that the sex effect gets very large in those models. The strain effects are only slightly significant for within mouse variance measures in the full dataset (includes low quality cells)
\[mouse CO metric ~=~ subsp * sex * strain + \varepsilon\] M2 slash M3
##
## Call:
## lm(formula = var ~ subsp * sex * strain, data = DF.HetC.MixedModel.HQ)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.3561 -2.5283 -0.5661 1.5918 15.5428
##
## Coefficients: (32 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.615 1.475 7.877 1.63e-12 ***
## subspMol 6.333 5.711 1.109 0.269681
## subspMusc 3.197 2.357 1.356 0.177528
## sexmale -4.646 2.284 -2.034 0.044181 *
## strainG 2.702 2.171 1.245 0.215670
## strainLEW 9.151 2.357 3.882 0.000169 ***
## strainPWD -2.264 2.326 -0.973 0.332364
## strainMSM -3.300 5.711 -0.578 0.564428
## strainMOLF NA NA NA NA
## strainSKIVE -5.010 5.816 -0.861 0.390667
## strainKAZ NA NA NA NA
## subspMol:sexmale -7.324 6.382 -1.148 0.253410
## subspMusc:sexmale -4.144 3.372 -1.229 0.221429
## subspMol:strainG NA NA NA NA
## subspMusc:strainG NA NA NA NA
## subspMol:strainLEW NA NA NA NA
## subspMusc:strainLEW NA NA NA NA
## subspMol:strainPWD NA NA NA NA
## subspMusc:strainPWD NA NA NA NA
## subspMol:strainMSM NA NA NA NA
## subspMusc:strainMSM NA NA NA NA
## subspMol:strainMOLF NA NA NA NA
## subspMusc:strainMOLF NA NA NA NA
## subspMol:strainSKIVE NA NA NA NA
## subspMusc:strainSKIVE NA NA NA NA
## subspMol:strainKAZ NA NA NA NA
## subspMusc:strainKAZ NA NA NA NA
## sexmale:strainG -3.368 3.244 -1.038 0.301239
## sexmale:strainLEW -8.024 3.599 -2.230 0.027610 *
## sexmale:strainPWD 4.371 3.462 1.263 0.209125
## sexmale:strainMSM 6.530 6.731 0.970 0.333906
## sexmale:strainMOLF NA NA NA NA
## sexmale:strainSKIVE 5.891 6.533 0.902 0.368989
## sexmale:strainKAZ NA NA NA NA
## subspMol:sexmale:strainG NA NA NA NA
## subspMusc:sexmale:strainG NA NA NA NA
## subspMol:sexmale:strainLEW NA NA NA NA
## subspMusc:sexmale:strainLEW NA NA NA NA
## subspMol:sexmale:strainPWD NA NA NA NA
## subspMusc:sexmale:strainPWD NA NA NA NA
## subspMol:sexmale:strainMSM NA NA NA NA
## subspMusc:sexmale:strainMSM NA NA NA NA
## subspMol:sexmale:strainMOLF NA NA NA NA
## subspMusc:sexmale:strainMOLF NA NA NA NA
## subspMol:sexmale:strainSKIVE NA NA NA NA
## subspMusc:sexmale:strainSKIVE NA NA NA NA
## subspMol:sexmale:strainKAZ NA NA NA NA
## subspMusc:sexmale:strainKAZ NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.517 on 121 degrees of freedom
## Multiple R-squared: 0.3993, Adjusted R-squared: 0.3249
## F-statistic: 5.363 on 15 and 121 DF, p-value: 3.682e-08
##
## Call:
## lm(formula = cV ~ subsp * sex * strain, data = DF.HetC.MixedModel.HQ)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.5135 -1.1837 -0.0016 1.3841 6.8193
##
## Coefficients: (32 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.12843 0.72726 18.052 < 2e-16 ***
## subspMol 2.21049 2.81668 0.785 0.43411
## subspMusc 1.63817 1.16261 1.409 0.16139
## sexmale -2.60550 1.12667 -2.313 0.02244 *
## strainG -0.01678 1.07050 -0.016 0.98752
## strainLEW 3.40510 1.16261 2.929 0.00407 **
## strainPWD -1.70760 1.14735 -1.488 0.13927
## strainMSM -1.68617 2.81668 -0.599 0.55053
## strainMOLF NA NA NA NA
## strainSKIVE -2.69574 2.86837 -0.940 0.34919
## strainKAZ NA NA NA NA
## subspMol:sexmale -2.72817 3.14775 -0.867 0.38782
## subspMusc:sexmale -2.51431 1.66292 -1.512 0.13314
## subspMol:strainG NA NA NA NA
## subspMusc:strainG NA NA NA NA
## subspMol:strainLEW NA NA NA NA
## subspMusc:strainLEW NA NA NA NA
## subspMol:strainPWD NA NA NA NA
## subspMusc:strainPWD NA NA NA NA
## subspMol:strainMSM NA NA NA NA
## subspMusc:strainMSM NA NA NA NA
## subspMol:strainMOLF NA NA NA NA
## subspMusc:strainMOLF NA NA NA NA
## subspMol:strainSKIVE NA NA NA NA
## subspMusc:strainSKIVE NA NA NA NA
## subspMol:strainKAZ NA NA NA NA
## subspMusc:strainKAZ NA NA NA NA
## sexmale:strainG -0.65614 1.59988 -0.410 0.68245
## sexmale:strainLEW -2.92297 1.77482 -1.647 0.10217
## sexmale:strainPWD 1.50426 1.70739 0.881 0.38005
## sexmale:strainMSM 1.30668 3.31949 0.394 0.69454
## sexmale:strainMOLF NA NA NA NA
## sexmale:strainSKIVE 2.67843 3.22206 0.831 0.40745
## sexmale:strainKAZ NA NA NA NA
## subspMol:sexmale:strainG NA NA NA NA
## subspMusc:sexmale:strainG NA NA NA NA
## subspMol:sexmale:strainLEW NA NA NA NA
## subspMusc:sexmale:strainLEW NA NA NA NA
## subspMol:sexmale:strainPWD NA NA NA NA
## subspMusc:sexmale:strainPWD NA NA NA NA
## subspMol:sexmale:strainMSM NA NA NA NA
## subspMusc:sexmale:strainMSM NA NA NA NA
## subspMol:sexmale:strainMOLF NA NA NA NA
## subspMusc:sexmale:strainMOLF NA NA NA NA
## subspMol:sexmale:strainSKIVE NA NA NA NA
## subspMusc:sexmale:strainSKIVE NA NA NA NA
## subspMol:sexmale:strainKAZ NA NA NA NA
## subspMusc:sexmale:strainKAZ NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.721 on 121 degrees of freedom
## Multiple R-squared: 0.4086, Adjusted R-squared: 0.3353
## F-statistic: 5.573 on 15 and 121 DF, p-value: 1.664e-08
##
## Call:
## lm(formula = var ~ subsp * sex * strain, data = Q12_mouse_table_w.Ages)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.646 -3.164 -1.087 1.905 23.611
##
## Coefficients: (33 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 18.9977 2.6527 7.162 1.68e-10 ***
## subsp3 -5.5930 3.4246 -1.633 0.105745
## subspMol -3.6958 5.6273 -0.657 0.512925
## sexmale -13.4472 3.5092 -3.832 0.000228 ***
## strainG -3.7571 3.2978 -1.139 0.257445
## strainLEW 4.5656 3.6150 1.263 0.209698
## strainPWD -3.1455 2.9855 -1.054 0.294753
## strainMSM 0.1353 4.5946 0.029 0.976578
## strainMOLF NA NA NA NA
## strainSKIVE 4.4733 6.8493 0.653 0.515263
## strainKAZ NA NA NA NA
## subsp3:sexmale 7.1286 4.6580 1.530 0.129242
## subspMol:sexmale 2.8985 5.1370 0.564 0.573918
## subsp3:strainG NA NA NA NA
## subspMol:strainG NA NA NA NA
## subsp3:strainLEW NA NA NA NA
## subspMol:strainLEW NA NA NA NA
## subsp3:strainPWD NA NA NA NA
## subspMol:strainPWD NA NA NA NA
## subsp3:strainMSM NA NA NA NA
## subspMol:strainMSM NA NA NA NA
## subsp3:strainMOLF NA NA NA NA
## subspMol:strainMOLF NA NA NA NA
## subsp3:strainSKIVE NA NA NA NA
## subspMol:strainSKIVE NA NA NA NA
## subsp3:strainKAZ NA NA NA NA
## subspMol:strainKAZ NA NA NA NA
## sexmale:strainG 5.1118 4.4712 1.143 0.255793
## sexmale:strainLEW -4.2891 5.0382 -0.851 0.396730
## sexmale:strainPWD 3.0422 4.4313 0.687 0.494051
## sexmale:strainMSM NA NA NA NA
## sexmale:strainMOLF NA NA NA NA
## sexmale:strainSKIVE -5.7354 7.7491 -0.740 0.461037
## sexmale:strainKAZ NA NA NA NA
## subsp3:sexmale:strainG NA NA NA NA
## subspMol:sexmale:strainG NA NA NA NA
## subsp3:sexmale:strainLEW NA NA NA NA
## subspMol:sexmale:strainLEW NA NA NA NA
## subsp3:sexmale:strainPWD NA NA NA NA
## subspMol:sexmale:strainPWD NA NA NA NA
## subsp3:sexmale:strainMSM NA NA NA NA
## subspMol:sexmale:strainMSM NA NA NA NA
## subsp3:sexmale:strainMOLF NA NA NA NA
## subspMol:sexmale:strainMOLF NA NA NA NA
## subsp3:sexmale:strainSKIVE NA NA NA NA
## subspMol:sexmale:strainSKIVE NA NA NA NA
## subsp3:sexmale:strainKAZ NA NA NA NA
## subspMol:sexmale:strainKAZ NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.498 on 95 degrees of freedom
## Multiple R-squared: 0.4514, Adjusted R-squared: 0.3706
## F-statistic: 5.583 on 14 and 95 DF, p-value: 9.934e-08
##
## Call:
## lm(formula = cV ~ subsp * sex * strain, data = Q12_mouse_table_w.Ages)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.7645 -1.6952 -0.3251 1.1253 11.3082
##
## Coefficients: (33 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.9160 1.2208 13.037 < 2e-16 ***
## subsp3 -2.1013 1.5760 -1.333 0.1856
## subspMol -1.1336 2.5897 -0.438 0.6626
## sexmale -7.0587 1.6150 -4.371 3.16e-05 ***
## strainG -2.5841 1.5176 -1.703 0.0919 .
## strainLEW 0.6301 1.6637 0.379 0.7057
## strainPWD -2.2640 1.3740 -1.648 0.1027
## strainMSM -1.5120 2.1145 -0.715 0.4763
## strainMOLF NA NA NA NA
## strainSKIVE 1.9035 3.1521 0.604 0.5474
## strainKAZ NA NA NA NA
## subsp3:sexmale 2.5285 2.1436 1.180 0.2411
## subspMol:sexmale 0.4125 2.3641 0.174 0.8619
## subsp3:strainG NA NA NA NA
## subspMol:strainG NA NA NA NA
## subsp3:strainLEW NA NA NA NA
## subspMol:strainLEW NA NA NA NA
## subsp3:strainPWD NA NA NA NA
## subspMol:strainPWD NA NA NA NA
## subsp3:strainMSM NA NA NA NA
## subspMol:strainMSM NA NA NA NA
## subsp3:strainMOLF NA NA NA NA
## subspMol:strainMOLF NA NA NA NA
## subsp3:strainSKIVE NA NA NA NA
## subspMol:strainSKIVE NA NA NA NA
## subsp3:strainKAZ NA NA NA NA
## subspMol:strainKAZ NA NA NA NA
## sexmale:strainG 4.0256 2.0576 1.956 0.0534 .
## sexmale:strainLEW -0.2134 2.3186 -0.092 0.9269
## sexmale:strainPWD 1.6210 2.0393 0.795 0.4287
## sexmale:strainMSM NA NA NA NA
## sexmale:strainMOLF NA NA NA NA
## sexmale:strainSKIVE -2.5564 3.5662 -0.717 0.4752
## sexmale:strainKAZ NA NA NA NA
## subsp3:sexmale:strainG NA NA NA NA
## subspMol:sexmale:strainG NA NA NA NA
## subsp3:sexmale:strainLEW NA NA NA NA
## subspMol:sexmale:strainLEW NA NA NA NA
## subsp3:sexmale:strainPWD NA NA NA NA
## subspMol:sexmale:strainPWD NA NA NA NA
## subsp3:sexmale:strainMSM NA NA NA NA
## subspMol:sexmale:strainMSM NA NA NA NA
## subsp3:sexmale:strainMOLF NA NA NA NA
## subspMol:sexmale:strainMOLF NA NA NA NA
## subsp3:sexmale:strainSKIVE NA NA NA NA
## subspMol:sexmale:strainSKIVE NA NA NA NA
## subsp3:sexmale:strainKAZ NA NA NA NA
## subspMol:sexmale:strainKAZ NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.99 on 95 degrees of freedom
## Multiple R-squared: 0.4911, Adjusted R-squared: 0.4161
## F-statistic: 6.549 on 14 and 95 DF, p-value: 4.533e-09
For LEW - females must have high variance, for CV Lew strain alone has increased cv. Consider plotting the variance - and diving deeper into this pattern Most of the strain specific differences in within mouse variance go away in the Q12 dataset.
\[mouse \ CO \ metric ~=~ subsp * sex * strain + \varepsilon\]
##
## Call:
## lm(formula = var ~ sex * strain, data = DF.HetC.MixedModel.HQ)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.3561 -2.5283 -0.5661 1.5918 15.5428
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.6151 1.4746 7.877 1.63e-12 ***
## sexmale -4.6455 2.2844 -2.034 0.044181 *
## strainG 2.7015 2.1705 1.245 0.215670
## strainLEW 9.1510 2.3573 3.882 0.000169 ***
## strainPWD 0.9331 2.0503 0.455 0.649869
## strainMSM 3.0326 2.0854 1.454 0.148468
## strainMOLF 6.3329 5.7110 1.109 0.269681
## strainSKIVE -1.8131 5.7110 -0.317 0.751426
## strainKAZ 3.1972 2.3573 1.356 0.177528
## sexmale:strainG -3.3679 3.2439 -1.038 0.301239
## sexmale:strainLEW -8.0239 3.5986 -2.230 0.027610 *
## sexmale:strainPWD 0.2272 3.3246 0.068 0.945625
## sexmale:strainMSM -0.7945 3.8734 -0.205 0.837827
## sexmale:strainMOLF -7.3241 6.3823 -1.148 0.253410
## sexmale:strainSKIVE 1.7469 6.4613 0.270 0.787336
## sexmale:strainKAZ -4.1441 3.3717 -1.229 0.221429
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.517 on 121 degrees of freedom
## Multiple R-squared: 0.3993, Adjusted R-squared: 0.3249
## F-statistic: 5.363 on 15 and 121 DF, p-value: 3.682e-08
##
## Call:
## lm(formula = cV ~ sex * strain, data = DF.HetC.MixedModel.HQ)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.5135 -1.1837 -0.0016 1.3841 6.8193
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.12843 0.72726 18.052 < 2e-16 ***
## sexmale -2.60550 1.12667 -2.313 0.02244 *
## strainG -0.01678 1.07050 -0.016 0.98752
## strainLEW 3.40510 1.16261 2.929 0.00407 **
## strainPWD -0.06943 1.01122 -0.069 0.94538
## strainMSM 0.52432 1.02851 0.510 0.61113
## strainMOLF 2.21049 2.81668 0.785 0.43411
## strainSKIVE -1.05756 2.81668 -0.375 0.70797
## strainKAZ 1.63817 1.16261 1.409 0.16139
## sexmale:strainG -0.65614 1.59988 -0.410 0.68245
## sexmale:strainLEW -2.92297 1.77482 -1.647 0.10217
## sexmale:strainPWD -1.01006 1.63971 -0.616 0.53905
## sexmale:strainMSM -1.42149 1.91037 -0.744 0.45826
## sexmale:strainMOLF -2.72817 3.14775 -0.867 0.38782
## sexmale:strainSKIVE 0.16411 3.18671 0.051 0.95901
## sexmale:strainKAZ -2.51431 1.66292 -1.512 0.13314
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.721 on 121 degrees of freedom
## Multiple R-squared: 0.4086, Adjusted R-squared: 0.3353
## F-statistic: 5.573 on 15 and 121 DF, p-value: 1.664e-08
##
## Call:
## lm(formula = var ~ sex * strain, data = Q12_mouse_table_w.Ages)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.646 -3.164 -1.087 1.905 23.611
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 18.9977 2.6527 7.162 1.68e-10 ***
## sexmale -13.4472 3.5092 -3.832 0.000228 ***
## strainG -3.7571 3.2978 -1.139 0.257445
## strainLEW 4.5656 3.6150 1.263 0.209698
## strainPWD -8.7385 3.3555 -2.604 0.010686 *
## strainMSM -3.5605 3.2489 -1.096 0.275889
## strainMOLF -0.7973 3.9791 -0.200 0.841627
## strainSKIVE -1.1197 7.0184 -0.160 0.873588
## strainKAZ -5.5930 3.4246 -1.633 0.105745
## sexmale:strainG 5.1118 4.4712 1.143 0.255793
## sexmale:strainLEW -4.2891 5.0382 -0.851 0.396730
## sexmale:strainPWD 10.1708 4.7506 2.141 0.034838 *
## sexmale:strainMSM 2.8985 5.1370 0.564 0.573918
## sexmale:strainMOLF NA NA NA NA
## sexmale:strainSKIVE 1.3932 7.9360 0.176 0.861021
## sexmale:strainKAZ 7.1286 4.6580 1.530 0.129242
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.498 on 95 degrees of freedom
## Multiple R-squared: 0.4514, Adjusted R-squared: 0.3706
## F-statistic: 5.583 on 14 and 95 DF, p-value: 9.934e-08
##
## Call:
## lm(formula = cV ~ sex * strain, data = Q12_mouse_table_w.Ages)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.7645 -1.6952 -0.3251 1.1253 11.3082
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.91601 1.22079 13.037 < 2e-16 ***
## sexmale -7.05868 1.61496 -4.371 3.16e-05 ***
## strainG -2.58410 1.51764 -1.703 0.09189 .
## strainLEW 0.63009 1.66366 0.379 0.70573
## strainPWD -4.36531 1.54419 -2.827 0.00573 **
## strainMSM -2.64563 1.49516 -1.769 0.08002 .
## strainMOLF -0.72118 1.83119 -0.394 0.69459
## strainSKIVE -0.19775 3.22991 -0.061 0.95131
## strainKAZ -2.10127 1.57603 -1.333 0.18563
## sexmale:strainG 4.02557 2.05764 1.956 0.05336 .
## sexmale:strainLEW -0.21336 2.31859 -0.092 0.92687
## sexmale:strainPWD 4.14951 2.18625 1.898 0.06073 .
## sexmale:strainMSM 0.41246 2.36405 0.174 0.86187
## sexmale:strainMOLF NA NA NA NA
## sexmale:strainSKIVE -0.02787 3.65219 -0.008 0.99393
## sexmale:strainKAZ 2.52850 2.14364 1.180 0.24113
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.99 on 95 degrees of freedom
## Multiple R-squared: 0.4911, Adjusted R-squared: 0.4161
## F-statistic: 6.549 on 14 and 95 DF, p-value: 4.533e-09
The general pattern across all the models, is that sex is a significant effect for all models. The sex-subsp interaction effect p values were larger and only significant for one of the models.
(some strains have significant effect – some LEW females had large leverage).
mouse average MLH1 count
Multiple models were tested to test for potential evo models – for the mouse average CO counts. (there were a mix of results – and nuanced)
First full Mixed model – no significant fixed factors –(the random effect might be significant).
The second model were strain was (nested and random), the fixed effects, sex and interaction of (subsp and sex), are much more significant. The coefficients indicate, males in general have 1 less in average, and the musc and molf subsp have ~3 more on average.
Variance effect
** Do these patterns hold across Quality bins? **
work more on figure 1. Main figure, MLH1 strain avs on top, chrm proportion bar plot on bottom
## Warning: NAs introduced by coercion
What are the steps the each chunks walk through.
Follow up on this finding – included more strains which were excluded from the HetC analysis.
– what’s the driving question for this section –
is there a strain effect in Dom?
How different are the Musc strains?
test a model for fixed effects of strains within musc. The purpose is to distinguish musc strains from low ‘Dom’ level strains.
##
## WSB G LEW PERC PWD MSM MOLF SKIVE KAZ TOM AST CZECH
## 12 18 10 1 8 8 6 6 13 2 3 3
## CAST HMI
## 2 4
## Warning: Removed 1 rows containing missing values (geom_point).
This is the table and plot of new Musc strains
| Var1 | Freq |
|---|---|
| PWD | 8 |
| SKIVE | 6 |
| KAZ | 13 |
| TOM | 2 |
| AST | 3 |
| CZECH | 3 |
Given the plot, make note of the potential outlier mice. The strains have unequal variance
KAZ: 10jul19_KAZ_m3, 10jul19_KAZ_m1, 10jul19_KAZ_m2
PWD: 24sep14_PWD_m1, 18may15_PWD_m1?
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.45316667 0.4255831 57.4580305 5.148209e-68
## subspCast -0.09891667 0.8511662 -0.1162131 9.077678e-01
## subspMusc -1.25950000 0.9516327 -1.3235148 1.893425e-01
## subspMol -0.21316667 0.7371316 -0.2891840 7.731704e-01
## strainG -0.25627778 0.5494254 -0.4664469 6.421325e-01
## strainLEW 0.59053333 0.6312417 0.9355106 3.522722e-01
## strainPERC -1.64516667 1.5344617 -1.0721458 2.867999e-01
## strainPWD 6.11270833 0.9980808 6.1244622 3.003421e-08
## strainMSM 7.21425000 0.7961931 9.0609304 5.444214e-14
## strainSKIVE 3.88183333 1.0424614 3.7237189 3.590207e-04
## strainKAZ 1.03964103 0.9442841 1.1009833 2.741253e-01
## strainTOM 1.50633333 1.3458119 1.1192748 2.662909e-01
## strainAST 2.10533333 1.2037308 1.7490068 8.403173e-02
## strainCAST -1.34525000 1.2767493 -1.0536524 2.951369e-01
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.45316667 0.4255831 57.4580305 5.148209e-68
## strainG -0.25627778 0.5494254 -0.4664469 6.421325e-01
## strainLEW 0.59053333 0.6312417 0.9355106 3.522722e-01
## strainPERC -1.64516667 1.5344617 -1.0721458 2.867999e-01
## strainPWD 4.85320833 0.6729060 7.2123129 2.478331e-10
## strainMSM 7.00108333 0.6729060 10.4042522 1.192684e-16
## strainMOLF -0.21316667 0.7371316 -0.2891840 7.731704e-01
## strainSKIVE 2.62233333 0.7371316 3.5574835 6.255027e-04
## strainKAZ -0.21985897 0.5901776 -0.3725302 7.104593e-01
## strainTOM 0.24683333 1.1259870 0.2192151 8.270273e-01
## strainAST 0.84583333 0.9516327 0.8888233 3.766980e-01
## strainCZECH -1.25950000 0.9516327 -1.3235148 1.893425e-01
## strainCAST -1.44416667 1.1259870 -1.2825784 2.032536e-01
## strainHMI -0.09891667 0.8511662 -0.1162131 9.077678e-01
msm, pwd skive are when all other male strains are added. skive is less sig than the others – but it’s still sig. Given these results I can group SKIVE, PWD and MSM into the ‘high rec’ group.
PWD, MSM, SKIVE are in ‘high rec group’
make a smaller proportion chrm block –
These results are meant to compare the proportions of bivalents with 0,1,2 or 3 chromosomes. Most of the variation in gwRR across strains in is due to more 2COs at the ‘expense’ / trade off of 1COs. (the gwRR for most mice is low/close to the minimum)
This is also another way to score heterochiasmy (that gets around the sex chrms, i think)
The data below are from hand measured cells (not from bivalent data) generate smaller data set of chrm proportions (0CO, 1CO,2CO etc) 10 cells from each musc category. Add Molf female
## Warning: Removed 39 rows containing missing values (geom_col).
High musc males have half 1CO and 2CO bivalents. For following up on the biv stuff, find ways to compare the 1CO to 2COs in Dom and Musc. (sis-coten, length, distance from centromere).
Add p.values of tests for the how the proportions differ.
*cast female data came from another publication, add this ref (Kohler?)
The bottom lines and subspecies points don’t exactly match up.
add in cell images like the other draft figures.
The figure below shows the average percentages of chromosome classes per category.
These results align with the previous figure I made using percentages from hand measured results (dozens of observations, instead of hundreds). These results fit with the rapid male evolution of gwRR in MSM and PWD.
These data are meant to test if CO precursors, double strand breaks (DSBs), are significant predictors for MLH1 count variation. Simply, do mice with more DSBs also have more COs?
ToDo: - attempt to supplement data with more KAZ DMC1 cell observations. - change all the bad variable names
The counts look close to normal.
Potentially figure 2. Using a function to display the p-vales for mean comparison
Summary:
The distributions of counts is close to normal (also the higher count is better than the MLH1 counts)
Above figure shows p-values for comparison of means across the cell stages.
Foci of DMC1 were scored from Leptotene and Zygotene spermatocytes of juvenile mice (12-14-18 days). One mouse represents each strain. The main comparison to examine is that between the high (PWD and MSM) and low recombining (WSB, G, KAZ) strains.
ToDo think of ways to add in the compared means for the high and low groups (across cell stage) how similar are G-WSB or PWD-MSM to each other?
The number of observations are not quite equal across strains and cell stages, so I ran permutations sampling 25 cells from the high and low recombining groups of cells. Below are the distributions of p values for permuted t-tests from sampling 25 cells from the pooled high and low recombining groups. Permutations were done for both cell stages. The actual p value is in red.
The observed p value does not fall outside of the permuted distribution with assumes equal cell sampling.
At what stage are DMC1 foci numbers most correlated with MLH1?
The correlation with MLH1 and leptotene cells is 0.8736143.
The correlation with MLH1 and zygotene cells is 0.284302.
Evidence for non-equal variance across strains for zygotene cells.
Ideas for post-hoc tests, remove strains 1 at a time and test variance. Test variance for L cells without KAz.
From ANOVA analysis strain is significant for Leptotene cells but not Zygotene cells. (but take these results with a grain of salt before assumptions are tested.)
Divide the Musculus strains into high and low recombining groups to characterize the difference in distributions. The p-values from the t.tests of the high vs low groups, indicate that the high recombining are significantly higher for the L cells (p value = r ttest.HighLow.L\(p.value) while the Z cells, there is not a significant difference across the high and low groups (value = r ttest.HighLow.Z\)p.value).
I present heterochiasmy as a comparison of oocyte to spermatocyte MLH1 counts, but the sex chromosomes/bivalents complicate this comparison. In females the XX bivalent is indistinguishable from the autosomes. To the meiotic recombination machinery, it is an autosome and has a similar REC landscape.
Whereas in spermatocytes the XY bivalent is visually distinct and any MLH1 where not included in the count). (I note if the and Y are paired, which they are at a high rate). The XY pair triggers a response to un-paired chromosomes and only has MLH1 foci within the PAR (the the tips of X and Y).
To make a more equivalent comparison I will estimate which bivalent is the XX in oocytes, and subtract that average REC from the category average of each strain.
According to mouse genome website, the X is the 3rd largest chromosome by total amount of DNA (Mb).
There is now MOLF, which has female biased hetC 3 of my Musc strains have male biased patter; SKIVE, PWD and MSM. 1 of the musc strains has female biased heterochiasmy, KAZ.
The mouse specific scatter plots aren’t show here because there are too bulky. These plots are in a different document.
Making all of these scatter plots, allows us to look at the whole distributions of the data for each mouse. The distance of the red line from the black could be a indicator of slides or mice with slide specific technical noise.
Bivalent level traits and metrics have been added in the src/Setup_BivData.Rmd script. These observations are from the automated image analysis algorithm and have been curated (threw out incorrect algorithm output). The MLH1 data file is also loaded into this file.
The breakdown of curated bivalents by category
Q1. driving questions, which traits are sexually dimorphic?
Q2. which traits fit male polymorphism predictions?
Q1. same models for MLH1 based sex differences (and evolution)
Q2 the same models for the male polymorphism and logistic regression for testing if bivalent traits can predcit if they are from high or low rec group
I decided to use the mouse averages for most of the bivalent metric tests. some of my logic is below.
Pro Bivalent level
The data are continuous and closer to normal/Gaussian expectations than the MLH1 counts.
There is within cell variance that is an issue (would affect single bivalent observations).
Pro Mouse averages
Mouse average is a conservative choice, it would summarize general patterns which fits with the paper.
Simple application of the same models as MLH1 based reseults.
Limited work to add to the data set if I can justify that the collection of single bivalents are random and equivalent across mice.
So I’ll use the mouse averages for all the metrics is what is used in the mixed models.
In the chunk above the mouse averages table is made – may need to add all the extra metrics (IFD, .
Using the Mixed model framework which tests the effects (and interactions) of subspecies, sex and strain, I will test for evolution of the following traits.
Two bivalent level traits are predicted to display heterochiasmy (ie significant effects of sex);
2.A) Normalized 1CO positions will be sexually dimorphic (sedell and Kirkpatrick).
sister cohesion tension (sis-co-ten) will be sexually dimorphic as it reflects the general property of uniform vs telomere/biased CO positioning.
Centromere and telomere distances will be sexually dimorphic.
We expect female SC lengths to be longer (refs).
In the plot above the SC length ~ higher hand foci cells. Any of the chromosome classes above 3, that don’t have a higher mean are likely due to low data number.
For the 0 class chromosomes most all are around the same size of the 1CO distribution.
Add in the code for the plot under 2COs males and females
The dependant variables I’ll be testing in the mixed model framework are:
Pooled SC lengths
1CO
2CO
3CO
long.biv
\[mouse \ average \ SC \ length ~=~ subsp * sex + rand(strain) + \varepsilon \]
Insert the results for the MMs using the dependant variable 1CO - use the Mixed model (lmer), for all of the meterics (pooled SC, 1CO 2CO ect)
Below’s the code for Mixed model results – try to organize them into a table or somethinf
##
## simulated finite sample distribution of RLRT.
##
## (p-value based on 10000 simulated values)
##
## data:
## RLRT = 6.0335, p-value = 0.0042
##
## simulated finite sample distribution of RLRT.
##
## (p-value based on 10000 simulated values)
##
## data:
## RLRT = 4.2988, p-value = 0.0133
##
## simulated finite sample distribution of RLRT.
##
## (p-value based on 10000 simulated values)
##
## data:
## RLRT = 0.074804, p-value = 0.3099
##
## simulated finite sample distribution of RLRT.
##
## (p-value based on 10000 simulated values)
##
## data:
## RLRT = 0.10951, p-value = 0.2911
All fixed and all interactions
Dive deeper into the sex specific pattern for each strain. (I think for all the within strain comparisons across sexes the general pattern is met of females being longer)
(adding all of these blocks to – the setup script); strain.Bivalents.DF
The general predictions across the males and subspecies based on th above MLH1 results.
For positive correlation traits/metrics
in DOM strains, low to no difference across strains
in Musc, PWD > SKIVE > KAZ, CZECH all the others
in Mol, MSM > MOLF
B.1) Interfernce/IFD will be shorter in high Rec strains. Use IFD_PER to account for SC length differences.
C.1) not enough is known about variation within species for the 1CO normalized positions. Null prediction, no difference in the ‘telomeric pattern’.
The mouse averages for the other position meterics will be highly influenced by proportions of the 1CO and 2CO bivalents. When class of chromosome and SC length is account for, there won’t be a difference, however, not enough is known about these patterns.
C.2) sis-co-ten metric … (what about the clustering?)
C.3) telomere and centromere distances …
\[mouse \ average \ SC \ length ~=~ rand(strain) + \varepsilon \]
\[mouse \ average \ IFD \ trait ~=~ rand(strain) + \varepsilon \] The IFD traits are:
IFD_ABS
IFD_PER
\[mouse \ average \ CO \ position \ trait ~=~ rand(strain) + \varepsilon \]
These are the models for male polymorphism, (I am re-thinking the tests I think)
Use these DFs for the models: Male.poly.Mouse.Table_BivData_4MM and Long.biv_mouse_table_male.poly
** add chunks for glm()s of male polymorphism**
If I want to try to use mouse nested in strain – I should use cell level metrics (but those have there own flaws)
Ideas for the above 3 tests,
Is sex a significant effect for SC length? (as predicted)
Sex is a significant effect for SC length The results seem to indicate that sex is a significant factor. Consider writing a sub sampling approach (randomize / permute a data set of BivData)
According to anova, sex effect explains most of the variance in single bivalent SC lengths.
The Long Biv Data set largely agrees with the full curated dataset
I caveat I haven’t addressed yet, is the XX in the female Biv data averages —
Is the random Effect of strain an effect on SC length?
The exactRLRT() test indicates that the random strain effect might be significant, p= xx
(merge this with the code above) Prediction, High rec males have longer SC.
##
## Call:
## glm(formula = Rec.group ~ mean_SC, family = binomial(link = "logit"),
## data = Male.poly.Mouse.Table_BivData_4MM[(Male.poly.Mouse.Table_BivData_4MM$subsp ==
## "Musc"), ])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6584 -1.3640 0.7203 0.9847 1.1296
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -10.1979 12.7787 -0.798 0.425
## mean_SC 0.1332 0.1570 0.848 0.396
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 22.915 on 17 degrees of freedom
## Residual deviance: 21.993 on 16 degrees of freedom
## AIC: 25.993
##
## Number of Fisher Scoring iterations: 4
##
## Call:
## glm(formula = Rec.group ~ mean.SC_1CO, family = binomial(link = "logit"),
## data = Male.poly.Mouse.Table_BivData_4MM[(Male.poly.Mouse.Table_BivData_4MM$subsp ==
## "Musc"), ])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4116 -0.7298 0.1792 0.6717 1.6075
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 43.5380 23.8961 1.822 0.0685 .
## mean.SC_1CO -0.5540 0.3057 -1.812 0.0700 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 22.915 on 17 degrees of freedom
## Residual deviance: 15.971 on 16 degrees of freedom
## AIC: 19.971
##
## Number of Fisher Scoring iterations: 6
##
## Call:
## glm(formula = Rec.group ~ mean.SC_2CO, family = binomial(link = "logit"),
## data = Male.poly.Mouse.Table_BivData_4MM[(Male.poly.Mouse.Table_BivData_4MM$subsp ==
## "Musc"), ])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7969 -1.2826 0.6881 0.9783 1.0859
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -9.8176 10.8084 -0.908 0.364
## mean.SC_2CO 0.1115 0.1151 0.969 0.333
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 22.915 on 17 degrees of freedom
## Residual deviance: 21.846 on 16 degrees of freedom
## AIC: 25.846
##
## Number of Fisher Scoring iterations: 4
##
## Call:
## glm(formula = Rec.group ~ mean.SC_3CO, family = binomial(link = "logit"),
## data = Male.poly.Mouse.Table_BivData_4MM[(Male.poly.Mouse.Table_BivData_4MM$subsp ==
## "Musc"), ])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6137 -1.0045 0.5407 1.0467 1.2163
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -9.00755 9.97630 -0.903 0.367
## mean.SC_3CO 0.09471 0.10158 0.932 0.351
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 13.460 on 9 degrees of freedom
## Residual deviance: 12.073 on 8 degrees of freedom
## (8 observations deleted due to missingness)
## AIC: 16.073
##
## Number of Fisher Scoring iterations: 5
##
## Call:
## glm(formula = Rec.group ~ chromosomeLength, family = binomial(link = "logit"),
## data = Curated_BivData[(Curated_BivData$subsp == "Musc") &
## (Curated_BivData$sex == "male"), ])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5025 -1.3434 0.9579 1.0092 1.1089
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.017439 0.173746 -0.100 0.92005
## chromosomeLength 0.005474 0.002073 2.641 0.00826 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3389 on 2526 degrees of freedom
## Residual deviance: 3382 on 2525 degrees of freedom
## AIC: 3386
##
## Number of Fisher Scoring iterations: 4
##
## Call:
## glm(formula = Rec.group ~ chromosomeLength, family = binomial(link = "logit"),
## data = Curated_BivData[(Curated_BivData$subsp == "Musc") &
## (Curated_BivData$hand.foci.count == 1) & (Curated_BivData$sex ==
## "male"), ])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5098 -1.2592 0.9561 1.0676 1.3887
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.195761 0.220890 5.413 6.18e-08 ***
## chromosomeLength -0.012263 0.002783 -4.406 1.05e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2134.0 on 1556 degrees of freedom
## Residual deviance: 2114.3 on 1555 degrees of freedom
## (219 observations deleted due to missingness)
## AIC: 2118.3
##
## Number of Fisher Scoring iterations: 4
##
## Call:
## glm(formula = Rec.group ~ chromosomeLength, family = binomial(link = "logit"),
## data = Curated_BivData[(Curated_BivData$subsp == "Musc") &
## (Curated_BivData$hand.foci.count == 2) & (Curated_BivData$sex ==
## "male"), ])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9063 0.6221 0.6510 0.6704 0.7184
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.946531 0.554150 1.708 0.0876 .
## chromosomeLength 0.005022 0.005823 0.862 0.3884
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 678.45 on 687 degrees of freedom
## Residual deviance: 677.71 on 686 degrees of freedom
## (219 observations deleted due to missingness)
## AIC: 681.71
##
## Number of Fisher Scoring iterations: 4
##
## Call:
## glm(formula = Rec.group ~ chromosomeLength, family = binomial(link = "logit"),
## data = Curated_BivData[(Curated_BivData$subsp == "Musc") &
## (Curated_BivData$hand.foci.count == 3) & (Curated_BivData$sex ==
## "male"), ])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3371 -1.2384 0.9749 1.1077 1.2306
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.19049 2.78251 -0.428 0.669
## chromosomeLength 0.01367 0.02828 0.483 0.629
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 38.673 on 27 degrees of freedom
## Residual deviance: 38.437 on 26 degrees of freedom
## (219 observations deleted due to missingness)
## AIC: 42.437
##
## Number of Fisher Scoring iterations: 4
The mean SC logistic regresion model for mouse averages
-10.1979118, 0.1331859
and for the single bivalent levels
-0.0174394, 0.0054741
what are the pairwise t-test functions – use pairwise.t.test(Ozone, Month, p.adj = “bonf”)
The mouse average and single bivalents are significantly longer in the high rec group p = r T.test_SC.Len_mouse.av\(p.value and p= r T.test_SC.Len_single.biv\)p.value.
#Long Biv comparison
#hold off on reporting/ working with the long SC biv resutls
long.SC.mouse.avs_4MM_male <- Long_biv_mouse.avs_4MM[Long_biv_mouse.avs_4MM$sex == "male",]
long.SC.mouse.avs_4MM_Mmale <- long.SC.mouse.avs_4MM_male[long.SC.mouse.avs_4MM_male$subsp == "Musc",]
#divide musc males into groups
long.SC.mouse.avs_4MM_male$Rec.group <- ifelse(grepl("PWD male", long.SC.mouse.avs_4MM_male$category), 1,
ifelse(grepl("MSM male", long.SC.mouse.avs_4MM_male$category), 1, 0))
long.biv.log.reg <- glm(Rec.group ~ mean_SC,
data=long.SC.mouse.avs_4MM_male, family=binomial(link="logit"))
summary(long.biv.log.reg)#NS effect (likelt underpowered)
When all male mice are used, the predictive power is greater, than when just the Musc strains are used. When, just the Musc strain are used, The mouse mean SC is slightly significant in predicting if a mouse is in the high or low (should I consider running on female too?)
Is the prediction, high rec musc male strains have long SC met?
In a logistic regression, mouse average SC length is slightly predictive telling if a mouse is in a high or low Rec strain. I couldn’t get the Mixed models working for the male polymorphism predictions…
Brief lit review for IFD / interference expectations
Petkov et al 2007, CO interference underlies sex differences in RR
“Here we show that in mice, this is because of a shorter genomic interference distance in females than in males, measured in Mb. However, the interference distance is the same in terms of bivalent length. We propose a model in which the interference distance in the two sexes reflects the compaction of chromosomes at the pachytene stage of meiosis.”
Chrm1 genetic map,
(other human and mice refs
Tease n Hulten 2004: – no difference between MLH1 foci in males and females
DeBoer 2006, 13 – measured chrm 1 in males and females, both were 2.8 microns
))
Do I believe the physical/SC results?
B.1) Interfernce/IFD will be shorter in high Rec strains to allow more foci to fit on a single bivalent.
Interfocal Distance, a indication of CO interference.
The raw measures are pretty nosy, this might not be the best way to display these results.
Each point is a IFD observation. Maybe add mouse level lines for the means.
Still working on the best way to display the general IFD patterns.
Mixed model analysis for IFD (interference), the first set of models are made with the lme() functions.
\[mouse \ average \ IFD ~=~ subsp * sex + rand(strain) + \varepsilon \]
##
## simulated finite sample distribution of RLRT.
##
## (p-value based on 10000 simulated values)
##
## data:
## RLRT = 0.51642, p-value = 0.1708
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## Warning in optwrap(optimizer, devfun, x@theta, lower = x@lower, calc.derivs
## = TRUE, : convergence code 3 from bobyqa: bobyqa -- a trust region step
## failed to reduce q
##
## simulated finite sample distribution of RLRT.
##
## (p-value based on 10000 simulated values)
##
## data:
## RLRT = 0, p-value = 1
Sex is a slightly significant factor for raw and normalized IFD.
The table above should display the slightly unsual pattern, where the coefficients for the significant sex fixed effect are positive and neagtive in the raw and normalized values respectively. That is for the raw IFD values, females are signiciantly longer but the normalized IFD values, males are significantly longer.
show a raw plot, and norm plot and a scatter plot
I tested 2 versions of the mixed model for this flavor of trait, raw IFD and normalized IFD measure. The tables below are from anova( for the lmer model ). Random effect of strain is not significant for ABS IFD, and only slightly significant for the IFD.PER
Dive deeper into the sex specific pattern for each strain. Below are code chunks which show the unusal sex specific results for IFD measures. The general pattern is that, female raw IFD > male IFD and female PER IFD < male PER IFD. The scatter plots show that female raw measures are longer than male and for the PER values, the female mean is brought down by an enrichment of short IFDs.
For some strains, PWD, MSM and SKIVE there’s a 30% threshold in the male PER IFD distributions. (What does that mean?). How do I test / quantify this pattern? Cluster metric?
The range of normalized IFDs overlap closer in males and females in the WSB data.
The Lew pattern doesn’t have a clean cut off of nrm.IFD. the range of male and females overlap, but there are more female observations below.
For PWD, there are a few observations of the short IFDs for males, but there seems to be a cut-off / threshold at .3
For the KAZ, pattern the distinction between the male and female pattern is less distinct. There are fewer instances of females with v close IFD distances.
In the Skive data, it could be the case that the v. short IFD measures in females are rare / another class of observations.
The MSM pattern has a short range and longer range of nrm.IFD in males and females respectively.
The strains which show a clean “30% threshold” for normalized IFD in males are: PWD, SKIVE, and MSM (which are the 2 high Rec and a intermediate strain). The other strains which have more overlap between males and females are the Dom strains and KAZ.
Run comparisons for 3CO bivalents.
The polymorphism frame-work
prediction A, Is the Male polymorphism Prediction met? High rec strains have shorter IFDs?
logistic regression \[ Rec \ Group ~=~ mouse \ average \ IFD \] For these tests I am looking at the single IFD measure from 2CO bivalents.
For the logistic regression, comparing High and Low groups (there needs to be 2 groups) I think I should compare 2 sets of logistic models, where SKIVE (intermediate gwRR), is in high and low.
## [1] 0.0002170853
## [1] 4.377303e-06
## Warning: Removed 1821 rows containing non-finite values (stat_boxplot).
## Warning: Removed 1821 rows containing missing values (geom_point).
## Warning: Removed 1813 rows containing non-finite values (stat_boxplot).
## Warning: Removed 1813 rows containing missing values (geom_point).
Re-order the above figure if possible.
Neither t.test are sig for both the ABS and PER when I test just the Musc strains. The above t.tests are breaking the knitr
The t.tests for IFD1s at the bivalent level for the high and low musc males are significant.
None of the logistic regression models for ABS or PER IFD lengths are significant, even when just the Musc strains are used.
Post-hoc comparisons ideas?
Preliminary results from an independent data set indicated that PWD had longer IFDs, which goes against the simple prediction of more COs ~ denser spacing of foci on the same bivalent. This also indicated that interference distance may evolved in the house mouse complex.
Prediction B, Is there evidence for the alternative IFD meansure
There are a few traits that fall within the CO positions
1CO normalized position
sis.co.ten (sorta interference)
centromere and telomere distance
Heterochiasmy
2.A) Normalized 1CO positions will be sexually dimorphic (sedell and Kirkpatrick).
sister cohesion tension (sis-co-ten) will be sexually dimorphic as it reflects the general property of unifrom vs telomere/biased CO positioning.
Centromere and telomere distances will be sexually dimorphic.
Polymorphism
C.1) It’s unknown if 1CO normalized positions between high and low musc strains, will differ. (no good predictions come to mind)
C.2) sis-co-ten metric will be maximized in higher recombining strains .. because.
C.3) telomere and centromere distances will be shorter in low rec strains
Above plot focuses on the 1CO bivalent normalized positions since CO interference controls the general position of COs when there are multiple COs. This plot shows the sexual dimorphism in the density plots.
Consider adding annotate_text for the number of observations in each category. think about adding a vertical line for centromere, for the position means. Think about removing the extra Musc strains.
These box plot show that females have a much more medial position of single foci bivalents, (much closer to 50% compared to males). They also show that Musc males’ Foci1 position is slightly more central / medial compared to the same type of positions in the Dom male strains. MOLF males have much more medial positions than other strains.
the distribution of SC lengths and sis-coten seems very different across sexes
\[mouse \ average \ F1 position ~=~ subsp * sex + rand(strain) + \varepsilon \]
The mixed model data should only come from 1CO bivalent data.
the mouse average foci1 pos is more sig in t.test, but not log regression… (is something wrong?) Check the mouse averages for the F1_pos, there might be an outlier or mouse with v.few observations.
The metric Sis-co-ten measures the amount of sister cohesion connected to the other pole.
The logic of how the sis-co-ten metric is outlined in the figure below. The goal is to use this metric to model different tension active cohesion amounts as a consequence of different numbers and placements of chiasmata/CO. This metric is calculated using SC area as a proxy to the amount of cohesion at metaphase.
from (Lee, J. (2019). Is age-related increase of chromosome segregation errors in mammalian oocytes caused by cohesin deterioration?. Reproductive Medicine and Biology.)
## Warning: Removed 133 rows containing missing values (geom_point).
## Warning: Removed 46 rows containing missing values (geom_point).
Males have much clearer separation of siscoten across chrm classes. This is emphasized when SC length is also plotted. It seems like musc males have higher amounts of this metric compared to Dom males (which might)
To formally test the differences in sis-co-ten I plan to write a sub sampling / permutation loop to compare the mean(sis.co.ten) of the same numbers of bivalents of the same class.
BUT females have a greater range – so maybe it’s just a scale issue.
## Warning: Removed 24 rows containing missing values (geom_point).
## Warning: Removed 21 rows containing missing values (geom_point).
## Warning: Removed 12 rows containing missing values (geom_point).
## Warning: Removed 16 rows containing missing values (geom_point).
## Warning: Removed 23 rows containing missing values (geom_point).
## Warning: Removed 16 rows containing missing values (geom_point).
## Warning: Removed 8 rows containing missing values (geom_point).
I think the the normalized sis.co.ten plots also show that the there is more clustering of the sis.co.ten for the males.
The fixed effects, sex and sex*subsp are significant. The random strain effect is also significant.
Is the heterochiasmy prediction met?
Yes, model predicting the mouse average siscoten, sex and sex-subp interaction are significant factors. The Random strain effect is also significant.
##
## Call:
## glm(formula = Rec.group ~ mean.siscoten, family = binomial(link = "logit"),
## data = Male.poly.Mouse.Table_BivData_4MM[(Male.poly.Mouse.Table_BivData_4MM$subsp ==
## "Musc"), ])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.52564 -0.06612 0.00415 0.07733 1.40452
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -47.2084 30.9084 -1.527 0.127
## mean.siscoten 1.4513 0.9482 1.531 0.126
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 22.9145 on 17 degrees of freedom
## Residual deviance: 5.2122 on 16 degrees of freedom
## AIC: 9.2122
##
## Number of Fisher Scoring iterations: 9
##
## Call:
## glm(formula = Rec.group ~ SisCoTen, family = binomial(link = "logit"),
## data = Curated_BivData[(Curated_BivData$subsp == "Musc") &
## (Curated_BivData$sex == "male"), ])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9129 -1.2756 0.8089 0.9942 1.1671
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.024197 0.077458 0.312 0.755
## SisCoTen 0.015678 0.001955 8.019 1.07e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2978.7 on 2269 degrees of freedom
## Residual deviance: 2910.9 on 2268 degrees of freedom
## (37 observations deleted due to missingness)
## AIC: 2914.9
##
## Number of Fisher Scoring iterations: 4
All the sis.co.ten tests are highly significant. Maybe I should consider running a normalized sis.co.ten? I think nrm_siscoten would still reflect the differing cohesion struction/outcome.
My metric for telomere and centromere distance measure the distance of the nearest foci to the ends of the bivalent (SC). In the plots below each point is a single bivalent. I choose not to use the mark for centromere because it seems noisy and inconsistent…
## Warning: Removed 69 rows containing missing values (geom_point).
## Warning: Removed 82 rows containing missing values (geom_point).
Males on average have much lower raw telomere distance (reflects the telomere bias) compared to females. In Males, 2CO bivalents have very low telomere distances, while the 1CO bivalents have a greater range. In females the ranges of telomere distances have much more overlap.
##
## simulated finite sample distribution of RLRT.
##
## (p-value based on 10000 simulated values)
##
## data:
## RLRT = 6.7045, p-value = 0.0034
Mixed model result summary:
##
## Call:
## glm(formula = Rec.group ~ mean.telo.dist, family = binomial(link = "logit"),
## data = Mouse.Table_BivData_4MM[(Mouse.Table_BivData_4MM$subsp ==
## "Musc") & (Mouse.Table_BivData_4MM$sex == "male"), ])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8147 0.4434 0.5946 0.7428 0.8505
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.5876 5.9959 0.765 0.444
## mean.telo.dist -0.1719 0.3153 -0.545 0.586
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 15.012 on 14 degrees of freedom
## Residual deviance: 14.673 on 13 degrees of freedom
## (10369 observations deleted due to missingness)
## AIC: 18.673
##
## Number of Fisher Scoring iterations: 4
##
## Call:
## glm(formula = Rec.group ~ telo_dist, family = binomial(link = "logit"),
## data = Curated_BivData[(Curated_BivData$subsp == "Musc") &
## (Curated_BivData$sex == "male"), ])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4890 -1.3882 0.9169 0.9519 1.2709
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.708076 0.066551 10.640 < 2e-16 ***
## telo_dist -0.008411 0.002351 -3.577 0.000347 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3038.1 on 2303 degrees of freedom
## Residual deviance: 3025.3 on 2302 degrees of freedom
## (3 observations deleted due to missingness)
## AIC: 3029.3
##
## Number of Fisher Scoring iterations: 4
##
## Call:
## glm(formula = Rec.group ~ telo_dist_PER, family = binomial(link = "logit"),
## data = Curated_BivData[(Curated_BivData$subsp == "Musc") &
## (Curated_BivData$sex == "male"), ])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4869 -1.3823 0.9169 0.9533 1.1257
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.70338 0.06927 10.154 < 2e-16 ***
## telo_dist_PER -0.65966 0.20272 -3.254 0.00114 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3038.1 on 2303 degrees of freedom
## Residual deviance: 3027.5 on 2302 degrees of freedom
## (3 observations deleted due to missingness)
## AIC: 3031.5
##
## Number of Fisher Scoring iterations: 4
## Warning: Removed 126 rows containing missing values (geom_point).
## Warning: Removed 126 rows containing missing values (geom_point).
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
##
## Call:
## glm(formula = Rec.group ~ mean.cent.dist, family = binomial(link = "logit"),
## data = Mouse.Table_BivData_4MM[(Mouse.Table_BivData_4MM$subsp ==
## "Musc") & (Mouse.Table_BivData_4MM$sex == "male"), ])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -6.308e-05 2.100e-08 2.100e-08 2.100e-08 6.409e-05
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3373.04 1607241.50 0.002 0.998
## mean.cent.dist -79.92 38084.76 -0.002 0.998
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1.5012e+01 on 14 degrees of freedom
## Residual deviance: 8.0864e-09 on 13 degrees of freedom
## (10369 observations deleted due to missingness)
## AIC: 4
##
## Number of Fisher Scoring iterations: 25
##
## Call:
## glm(formula = Rec.group ~ dis.cent, family = binomial(link = "logit"),
## data = Curated_BivData[(Curated_BivData$subsp == "Musc") &
## (Curated_BivData$sex == "male"), ])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7903 -1.2795 0.7927 0.9501 1.6059
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.30679 0.09495 13.763 <2e-16 ***
## dis.cent -0.01884 0.00206 -9.149 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2980.5 on 2271 degrees of freedom
## Residual deviance: 2894.0 on 2270 degrees of freedom
## (35 observations deleted due to missingness)
## AIC: 2898
##
## Number of Fisher Scoring iterations: 4
##
## Call:
## glm(formula = Rec.group ~ dis.cent.PER, family = binomial(link = "logit"),
## data = Curated_BivData[(Curated_BivData$subsp == "Musc") &
## (Curated_BivData$sex == "male"), ])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7903 -1.2589 0.7835 0.9731 1.2327
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.30601 0.09884 13.213 <2e-16 ***
## dis.cent.PER -1.48646 0.17045 -8.721 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2980.5 on 2271 degrees of freedom
## Residual deviance: 2902.1 on 2270 degrees of freedom
## (35 observations deleted due to missingness)
## AIC: 2906.1
##
## Number of Fisher Scoring iterations: 4
The normalized centromere plots show that in Musc males, on 2CO bivalents the 1st CO is closer to the centromere end than in Dom males.
Females have more overlap in the distributions of centromere distances across chromosome class compared to males.
Is sex a significant effect for the 1CO normalized CO position? (as predicted)
is random strtain effect significant?
The random strain effect seems very significant.
remember to use the mouse average table mouse.avs_4MM (I don’t think I need the MELT data frame)
#make point plots / boxplots which show differences in mean positions
#scatter + boxplot for t.tests
Dr. Broman suggested that the Smirnov K /(curve comparison) wasn’t the best test to differences in general CO position. He suggested doing simple t-test for the positions
#try remaking the plot Megan suggested
# for 2CO positions, Foci1, Position on x and Foci 2 position on y
CurBivData_2CO <- Curated_BivData[Curated_BivData$hand.foci.count == 2,]
CurBivData_2CO <- CurBivData_2CO[!(is.na(CurBivData_2CO$Foci2) | CurBivData_2CO$Foci2==""), ]
#isolate 2COs
#facet by sex and subsp
F1.x.F2 <- ggplot(CurBivData_2CO, aes(x=Foci1,y=Foci2, color=strain) ) + geom_point()+ facet_wrap(~sex)+ggtitle("test plot")
F1.x.F2
#what is the pattern of variance
#run analyses for each subsp*sex
#use non-melt DF
#how is the variance partioned across
#cell, mouse, strain
female.Dom <- Curated_BivData[Curated_BivData$sex == "female",]
female.Dom <- female.Dom[female.Dom$subsp == "Dom",]
female.Dom$Foc1.PER <- female.Dom$Foci1 / female.Dom$chromosomeLength
#unorder strain and mouse
female.Dom$mouse <- as.factor(female.Dom$mouse)
female.Dom$strain <- unclass(female.Dom$strain)
female.Dom$strain <- as.factor(female.Dom$strain)
female.Dom_1CO <- female.Dom[female.Dom$hand.foci.count == 1,]
female.Dom_1CO <- female.Dom_1CO[(!is.na(female.Dom_1CO$hand.foci.count)),]
#1CO first
modo <- lm(Foc1.PER ~ fileName + mouse + strain, data=female.Dom_1CO)
#can't get mouse and strain to have sum of square
#residual size decreases with per.F1
#residuals much larger than fileName, mouse and strain no
#model <- lm(breaks ~ wool * tension,
# data = warpbreaks,
# contrasts = list(wool = "contr.sum", tension = "contr.poly"))
male.Dom <- Curated_BivData[Curated_BivData$sex == "male",]
male.Dom <- male.Dom[male.Dom$subsp == "Dom",]
male.Dom$mouse <- as.factor(male.Dom$mouse)
male.Dom$strain <- unclass(male.Dom$strain)
male.Dom$strain <- as.factor(male.Dom$strain)
male.Dom <- male.Dom[male.Dom$hand.foci.count == 1,]
male.Dom <- male.Dom[(!is.na(male.Dom$hand.foci.count)),]
male.Dom$Foc1.PER <- male.Dom$Foci1 / male.Dom$chromosomeLength
male.modo <- lm(Foc1.PER ~ fileName | mouse | strain, data=male.Dom)
summary(aov(male.modo))
#only file name is registering as effect
#Review ANOVA frameworks
#http://www.biostathandbook.com/nestedanova.html
I didn’t have a good prediction for the male polymorphism… So I’ll just do post-hoc comparisons for the groups to test if they are different.
put all of the code chuncks/analysis for caveats here
I tried to isolate bivalents which are in the top quartille for SC length from their cells. (re think where this section should go)
Below are examples of plots of SC length distributions across cells. The top figure shows whole cell hand measured data and the bottom shows the Automated bivData from cells with at least 15 bivalents measured. Most plts excluded for space.
Each point is a bivalent plotted by cell on the x axis. X’s are the 4th quartile, big point is the mean and smaller black point is the median. I’m using these to compare the patterns of these statistics in the automated data set which is missing some bivalent data. (the extra stats are not correclty mapped)
## Warning: Removed 3 rows containing missing values (geom_point).
## Warning: Removed 9 rows containing missing values (geom_point).
## Warning: Removed 9 rows containing missing values (geom_point).
## Warning: Removed 9 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 41 rows containing missing values (geom_point).
## Warning: Removed 41 rows containing missing values (geom_point).
## Warning: Removed 41 rows containing missing values (geom_point).
All the plots above show the distributions of manually whole cell measured SC lengths compared to the SC length distributions from the automated bivalent data. It shows the amount of within cell variance across strains. There is a bit of variance across the SC length distributions in the PWD females.
This data set might be noisy, given the amount of variance in the SC length distributions across cells (PWD females, WSB females).
The DF real.long.bivData contains 713 bivalent measures. The full data set is 9807. This is the breakdown of bivalent observations by category for this long dataset are.
- Try to merge this DF with the whole.cell manual measures.
in code chunk above I ran the mouse averages for the longest bivalets. (680 bivalents, from 54 mice. 10202 bivalents from 86 mice.
## Warning: Removed 3 rows containing missing values (geom_point).
These plots show the SC lengths for the ‘long SC data set’. They are supposed to be the longest 4-5 SC from cells where I could get good measures. These longer bivalents are useful because their patterns shouldn’t be affect by chromosome size effect (which effects, CO position). Hopefully this data set will have less noise from chromosome identity, but there was still data missing (they don’t come from whole cell measures).
(rethink where this section should go) new outline
illustrate problem(affects mostly SC length)
(prove general pattern that ALL bivalents are longer), chrms sorted by bin comparisons
19 female, 19male, 20female 20 male
The female mouse averages should have adjustments for the XX. working on code to estimate the SC length from 3rd largest bivalent from female whole cell data across strains. Subtract this amount from the female mouse averages … This isn’t the best solution – since I can’t determine what proportion of cells for female mouse averages include the XX, (most cells are missing at least 3 bivalents)
Of all female single bivalents observations, 5% are XX (1 of 20).
The XX is large, likely within the top 25% longest bivalents of the cell (3rd largest by Mb).
The average % of XX for whole cell SC (sum(all bivalents)) can be calculated from the whole.cell data set. Lets guess 12% of a cell’s total SC area is XX.
I think a formula something like this can be applied to adjust for XX
The plots above show the mean SC lengths and 2SE error bars for single bivalents which have been given within cell rank.
The first plot showing the mean sc lengths by the rank (most all of these cells have 3, MSM has 5 cells (observations)).
The purpose of these plots is to display the variance of single bivalents when they are assigned a within cell rank. For the longest bivalents, XX is predicted to be the 3rd longest (according to physical length Mb).
(use the value for the 3rd bivalent to adjust the single bivalent traits for XX – then compare to males values – or re-run in the MM).
The other figure shows of each single bivalent contributes to the total SC area. Each column is a cell and each color is the percent of total SC area for the longest 5 bivalents in that cell. on average, each of the top longest bivalents make up ~10% of the cell’s totl SC area. So for cells all 20 bivalents, of it’s total SC area, 5-7% is due to a XX,
Is the difference between cell averages for males and females less that 10%?
also interesting, the pwd and msm dont have longer sc, compared to other strains.
##
## WSB G LEW PERC PWD MSM MOLF SKIVE KAZ TOM
## 767 726 714 0 1031 550 0 0 0 0
## AST CZECH CAST HMI SPRET SPIC CAROLI F1 other
## 0 0 0 0 0 0 0 0 0
For the Automated dataset, I like to measure the rate of passing bivlents per cell. The mean pass rate will be multipled to the estimated XX mean_SC.
The table above shows the number of bivalents from the same strains as in the manual whole cell data. The plot shows the bivalent passing rate across all of the individual cells from this female data set. For each strain, I’ll calculate the mean bivalent passing rate (maybe I should look at the mouse levels).
(some of the mice have different ranges of per cell passing rate) - given this ranges, i think the xx adjustment factor should be calqd on the mouse level. (it could even be extended to cell level – except i dont think the XX sc length estimates wont be good.)
strain.XX.adjustment.factor = per_cell_passing rate * 1 of 20 random biv will be XX *
** It might be simplier to compare the male and female means, and test it they are greater than the whole cell proprotion of the XX in females cells.** The XX in a whole female cell contributes ~ 7% of total SC, if the female means for a type of total SC measure are from XX. But I am not using ‘whole cell’ summaries to compare female and males.
What is the effect of an extra XX-autosome on single bivalent means?
use a permutation approach: Make a True data set to start with, same(similar) number of cells, mice and bivalents. Make fake datasets which sample 19 bivalents, for ‘in silico’ cells for males and females. Also Run cntrl-female data set, where 20 bivalents are sampled, but randomly. Run the same bivalent level summaries for each ‘permuted data set’; male avSC, 19Female_avSC, and rand.20_Female_avSC. The difference between the rand.20 and rand.19 female -permuted data sets should indicate the influence of having an extra ‘XX-autosome’ in the total dataset.